Stationary Deterministic Policies for Constrained MDPs with Multiple Rewards, Costs, and Discount Factors

نویسندگان

  • Dmitri A. Dolgov
  • Edmund H. Durfee
چکیده

We consider the problem of policy optimization for a resource-limited agent with multiple timedependent objectives, represented as an MDP with multiple discount factors in the objective function and constraints. We show that limiting search to stationary deterministic policies, coupled with a novel problem reduction to mixed integer programming, yields an algorithm for finding such policies that is computationally feasible, where no such algorithm has heretofore been identified. In the simpler case where the constrained MDP has a single discount factor, our technique provides a new way for finding an optimal deterministic policy, where previous methods could only find randomized policies. We analyze the properties of our approach and describe implementation results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finite-Horizon Markov Decision Processes with State Constraints

Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (minimize costs) in a given stochastic dynamical environment. In many practical scenarios (multi-agent systems, telecommunication, queuing, etc.), the decision-making probl...

متن کامل

Constrained Markov Decision Models with Weighted Discounted Rewards

This paper deals with constrained optimization of Markov Decision Processes. Both objective function and constraints are sums of standard discounted rewards, but each with a diierent discount factor. Such models arise, e.g. in production and in applications involving multiple time scales. We prove that if a feasible policy exists, then there exists an optimal policy which is (i) stationary (non...

متن کامل

Splitting Randomized Stationary Policies in Total-Reward Markov Decision Processes

This paper studies a discrete-time total-reward Markov decision process (MDP) with a given initial state distribution. A (randomized) stationary policy can be split on a given set of states if the occupancy measure of this policy can be expressed as a convex combination of the occupancy measures of stationary policies, each selecting deterministic actions on the given set and coinciding with th...

متن کامل

Variance minimization for constrained discounted continuous-time MDPs with exponentially distributed stopping times

This paper deals with minimization of the variances of the total discounted costs for constrained Continuous-Time Markov Decision Processes (CTMDPs). The costs consist of cumulative costs incurred between jumps and instant costs incurred at jump epochs. We interpret discounting as an exponentially distributed stopping time. According to existing theory, for the expected total discounted costs o...

متن کامل

Dantzig's pivoting rule for shortest paths, deterministic MDPs, and minimum cost to time ratio cycles

Dantzig’s pivoting rule is one of the most studied pivoting rules for the simplex algorithm. Whilethe simplex algorithm with Dantzig’s rule may require an exponential number of pivoting stepson general linear programs, and even on min cost flow problems, Orlin showed that O(mn log n)Dantzig’s pivoting steps suffice to solve shortest paths problems (we denote the number of vertices<l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005